我们研究了具有$ \ epsilon $ -Global差异隐私(DP)的多臂土匪的问题。首先,我们证明了使用$ \ epsilon $ -Global DP量化土匪硬度的随机和线性土匪的最小值和问题依赖的后悔下限。这些界限表明存在两个硬度制度,具体取决于隐私预算$ \ epsilon $。在高私人制度(小$ \ epsilon $)中,硬度取决于隐私的耦合效果以及有关奖励分布的部分信息。在低私人制度(大$ \ epsilon $)中,具有$ \ epsilon $ -Global DP的土匪并不比没有隐私的土匪更难。对于随机匪徒,我们进一步提出了一个通用框架,以设计基于索引的乐观强盗算法的近乎最佳的$ \ epsilon $全局DP扩展。该框架由三种成分组成:拉普拉斯机制,依赖手臂的自适应发作以及仅在最后一集中收集的奖励来计算私人统计数据。具体而言,我们实例化了UCB和KL-UCB算法的Epsilon $ -Global DP扩展,即ADAP-UCB和ADAP-KLUCB。 Adap-klucb是两者都满足$ \ epsilon $ -Global DP的第一种算法,并产生了遗憾的上限,与问题依赖性下限与乘法常数相匹配。
translated by 谷歌翻译
尽管强化学习(RL)对于不确定性下的顺序决策问题有效,但在风险或安全性是具有约束力约束的现实系统中,它仍然无法蓬勃发展。在本文中,我们将安全限制作为非零和游戏制定了RL问题。在用最大熵RL部署的同时,此配方会导致一个安全的对手引导的软角色批评框架,称为SAAC。在SAAC中,对手旨在打破安全约束,而RL代理的目标是在对手的策略下最大程度地提高约束价值功能。对代理的价值函数的安全限制仅表现为代理商和对手政策之间的排斥项。与以前的方法不同,SAAC可以解决不同的安全标准,例如安全探索,均值差异风险敏感性和类似CVAR的相干风险敏感性。我们说明了这些约束的对手的设计。然后,在每种变化中,我们都表明,除了学习解决任务外,代理人与对手的不安全行为不同。最后,对于具有挑战性的持续控制任务,我们证明SAAC可以实现更快的融合,提高效率和更少的失败以满足安全限制,而不是风险避免风险的分布RL和风险中性的软性参与者批判性算法。
translated by 谷歌翻译
通常,根据某些固有的价值衡量标准,绩效是定义的。相反,我们考虑一个个人的价值为\ emph {相对}的设置:当决策者(DM)选择一组从人口中的个人来最大化预期效用时,自然考虑\ emph {预期的边际贡献}(每个人的emc)。我们表明,这个概念满足了这种环境公平性的公理定义。我们还表明,对于某些政策结构,这种公平概念与最大化的预期效用保持一致,而对于线性实用程序功能,它与Shapley值相同。但是,对于某些自然政策,例如选择具有一组特定属性的个人的政策(例如,大学入学的足够高考试成绩),精英级和公用事业最大化之间存在权衡。我们根据挪威大学的大学录取和成果,分析了限制对政策对效用和公平性的影响。
translated by 谷歌翻译
在本文中,我们考虑了增强学习(RL)中对风险敏感的顺序决策。我们的贡献是两个方面。首先,我们介绍了一种新颖而连贯的风险量化,即复合风险,该风险量化了学习过程中综合和认知风险的关节作用。现有的作品单独被视为综合性或认知风险,或作为添加剂组合。我们证明,当认知风险措施被期望取代时,添加剂配方是复合风险的特殊情况。因此,综合风险比单个和添加剂配方对伴侣和认知不确定性更敏感。我们还基于集合引导和分布RL提出了一种算法,Sentinel-K,分别代表认知和差异不确定性。 K Learners的合奏使用遵循正规领导者(FTRL)来汇总分布并获得综合风险。我们通过实验验证了Sentinel-K可以更好地估计回报分布,并且与复合风险估计相比,与最新风险敏感和分布RL算法相比,对风险敏感的性能更高。
translated by 谷歌翻译
Drug dosing is an important application of AI, which can be formulated as a Reinforcement Learning (RL) problem. In this paper, we identify two major challenges of using RL for drug dosing: delayed and prolonged effects of administering medications, which break the Markov assumption of the RL framework. We focus on prolongedness and define PAE-POMDP (Prolonged Action Effect-Partially Observable Markov Decision Process), a subclass of POMDPs in which the Markov assumption does not hold specifically due to prolonged effects of actions. Motivated by the pharmacology literature, we propose a simple and effective approach to converting drug dosing PAE-POMDPs into MDPs, enabling the use of the existing RL algorithms to solve such problems. We validate the proposed approach on a toy task, and a challenging glucose control task, for which we devise a clinically-inspired reward function. Our results demonstrate that: (1) the proposed method to restore the Markov assumption leads to significant improvements over a vanilla baseline; (2) the approach is competitive with recurrent policies which may inherently capture the prolonged effect of actions; (3) it is remarkably more time and memory efficient than the recurrent baseline and hence more suitable for real-time dosing control systems; and (4) it exhibits favorable qualitative behavior in our policy analysis.
translated by 谷歌翻译
Creativity is an indispensable part of human cognition and also an inherent part of how we make sense of the world. Metaphorical abstraction is fundamental in communicating creative ideas through nuanced relationships between abstract concepts such as feelings. While computer vision benchmarks and approaches predominantly focus on understanding and generating literal interpretations of images, metaphorical comprehension of images remains relatively unexplored. Towards this goal, we introduce MetaCLUE, a set of vision tasks on visual metaphor. We also collect high-quality and rich metaphor annotations (abstract objects, concepts, relationships along with their corresponding object boxes) as there do not exist any datasets that facilitate the evaluation of these tasks. We perform a comprehensive analysis of state-of-the-art models in vision and language based on our annotations, highlighting strengths and weaknesses of current approaches in visual metaphor Classification, Localization, Understanding (retrieval, question answering, captioning) and gEneration (text-to-image synthesis) tasks. We hope this work provides a concrete step towards developing AI systems with human-like creative capabilities.
translated by 谷歌翻译
Large-scale diffusion models have achieved state-of-the-art results on text-to-image synthesis (T2I) tasks. Despite their ability to generate high-quality yet creative images, we observe that attribution-binding and compositional capabilities are still considered major challenging issues, especially when involving multiple objects. In this work, we improve the compositional skills of T2I models, specifically more accurate attribute binding and better image compositions. To do this, we incorporate linguistic structures with the diffusion guidance process based on the controllable properties of manipulating cross-attention layers in diffusion-based T2I models. We observe that keys and values in cross-attention layers have strong semantic meanings associated with object layouts and content. Therefore, we can better preserve the compositional semantics in the generated image by manipulating the cross-attention representations based on linguistic insights. Built upon Stable Diffusion, a SOTA T2I model, our structured cross-attention design is efficient that requires no additional training samples. We achieve better compositional skills in qualitative and quantitative results, leading to a 5-8% advantage in head-to-head user comparison studies. Lastly, we conduct an in-depth analysis to reveal potential causes of incorrect image compositions and justify the properties of cross-attention layers in the generation process.
translated by 谷歌翻译
Breaking down a document or a conversation into multiple contiguous segments based on its semantic structure is an important and challenging problem in NLP, which can assist many downstream tasks. However, current works on topic segmentation often focus on segmentation of structured texts. In this paper, we comprehensively analyze the generalization capabilities of state-of-the-art topic segmentation models on unstructured texts. We find that: (a) Current strategies of pre-training on a large corpus of structured text such as Wiki-727K do not help in transferability to unstructured texts. (b) Training from scratch with only a relatively small-sized dataset of the target unstructured domain improves the segmentation results by a significant margin.
translated by 谷歌翻译
We propose a novel deep neural network architecture to learn interpretable representation for medical image analysis. Our architecture generates a global attention for region of interest, and then learns bag of words style deep feature embeddings with local attention. The global, and local feature maps are combined using a contemporary transformer architecture for highly accurate Gallbladder Cancer (GBC) detection from Ultrasound (USG) images. Our experiments indicate that the detection accuracy of our model beats even human radiologists, and advocates its use as the second reader for GBC diagnosis. Bag of words embeddings allow our model to be probed for generating interpretable explanations for GBC detection consistent with the ones reported in medical literature. We show that the proposed model not only helps understand decisions of neural network models but also aids in discovery of new visual features relevant to the diagnosis of GBC. Source-code and model will be available at https://github.com/sbasu276/RadFormer
translated by 谷歌翻译
Prompt tuning is a new few-shot transfer learning technique that only tunes the learnable prompt for pre-trained vision and language models such as CLIP. However, existing prompt tuning methods tend to learn spurious or entangled representations, which leads to poor generalization to unseen concepts. Towards non-spurious and efficient prompt learning from limited examples, this paper presents a novel \underline{\textbf{C}}ounterfactual \underline{\textbf{P}}rompt \underline{\textbf{L}}earning (CPL) method for vision and language models, which simultaneously employs counterfactual generation and contrastive learning in a joint optimization framework. Particularly, CPL constructs counterfactual by identifying minimal non-spurious feature change between semantically-similar positive and negative samples that causes concept change, and learns more generalizable prompt representation from both factual and counterfactual examples via contrastive learning. Extensive experiments demonstrate that CPL can obtain superior few-shot performance on different vision and language tasks than previous prompt tuning methods on CLIP. On image classification, we achieve 3.55\% average relative improvement on unseen classes across seven datasets; on image-text retrieval and visual question answering, we gain up to 4.09\% and 25.08\% relative improvements across three few-shot scenarios on unseen test sets respectively.
translated by 谷歌翻译